 
\frontmatter	  % Begin Roman style (i, ii, iii, iv...) page numbering

% Set up the Title Page
\title  {Data Integration over Distributed and Heterogeneous Data Endpoints}
 

\maketitle
%% ----------------------------------------------------------------

\setstretch{1.3}  % It is better to have smaller font and larger line spacing than the other way round

% Define the page headers using the FancyHdr package and set up for one-sided printing
\fancyhead{}  % Clears all page headers and footers
\rhead{\thepage}  % Sets the right side header to show the page number
\lhead{}  % Clears the left side page header

\pagestyle{fancy}  % Finally, use the "fancy" page style to implement the FancyHdr headers

 
\clearpage  % Declaration ended, now start a new page

%% ----------------------------------------------------------------
 
\pagestyle{empty}  % No headers or footers for the following pages
 
\begin{center}
\par
\bigskip
\bigskip
{\huge \bf Data Integration over Distributed and Heterogeneous Data Endpoints \par}
\bigskip
\bigskip
\bigskip
\bigskip
\bigskip
\bigskip
\bigskip
\bigskip
\bigskip
\bigskip
\bigskip
{\large \bf Proefschrift \par}
\bigskip
\bigskip

ter verkrijging van de graad van doctor\\
aan de Technische Universiteit Delft,\\
op gezag van de Rector Magnificus prof.ir. K.C.A.M. Luyben,\\
voorzitter van het College voor Promoties,\\
in het openbaar te verdedigen op dinsdag 4 februari 2014 om 15:00 uur\\
\bigskip
\bigskip
door \textbf{Samur Felipe CARDOSO DE ARAUJO}\\
\bigskip
\bigskip
Master of Science in Computer Science, Pontifical Catholic University of Rio de Janeiro,\\
geboren te Belo Horizonte, Minas Gerais, Brazil.
\end{center}
 
\vfill\vfill\vfill\vfill\vfill\vfill\null

\clearpage  % Funny Quote page ended, start a new page
%% ----------------------------------------------------------------

Dit proefschrift is goedgekeurd door de promotoren:\\
Prof.dr.ir. M.J.T. Reinders\\
Prof.dr.ir. A. P. de Vries
\bigskip

Samenstelling promotiecommissie:

\bigskip

\begin{tabular}{ll}
Rector Magnificus & voorzitter\\
Prof.dr.ir. M.J.T. Reinders  & Technische Universiteit Delft,  promotor\\
Prof.dr.ir. A. P. de Vries & Technische Universiteit Delft, supervisor \\
Prof.dr. D. Schwabe & Pontifical Catholic University of Rio de Janeiro\\
Prof. dr. ir. P.M.G. Apers  & University of Twente\\
Prof. dr. ir. A.van Deursen & Technische Universiteit Delft \\
Prof. dr. ir. F. van Harmelen & Vrije Universiteit Amsterdam \\
Assist.Prof. dr. ir A.J.H. Hidders &  Technische Universiteit Delft \\
Prof.dr. A. Hanjalic & Technische Universiteit Delft (reservelid)\\ 
\end{tabular}
 \bigskip
  \bigskip
   \bigskip
    \bigskip
 
SIKS Dissertation Series No. 2014-08
\begin{figure}[h]
\includegraphics[scale=0.2]{./siks.pdf} 
\end{figure}

The research reported in this thesis has been carried out under the auspices of SIKS,
the Dutch Research School for Information and Knowledge Systems.

Published and distributed by: Samur Araujo\\
E-mail: samuraraujo@gmail.com

ISBN: 
\\
Keywords:  data integration, semantic web, rdf, structured data, distribute querying, string transformation. 

Copyright \copyright 2014 by Samur Araujo\\
All rights reserved. No part of the material protected by this copyright notice may
be reproduced or utilized in any form or by any means, electronic or mechanical, including
photocopying, recording or by any information storage and retrieval system,
without written permission of the author.

%% ----------------------------------------------------------------
\clearpage  % Funny Quote page ended, start a new page
% The "Funny Quote Page"
\pagestyle{empty}  % No headers or footers for the following pages

\null\vfill
% Now comes the "Funny Quote", written in italics
\textit{``To attain knowledge, add things everyday. To attain wisdom, remove things every day.''}

\begin{flushright}
Lao Tzu
\end{flushright}

\vfill\vfill\vfill\vfill\vfill\vfill\null
\clearpage  % Funny Quote page ended, start a new page
%% ----------------------------------------------------------------
% The Abstract Page
\addtotoc{Summary}  % Add the "Abstract" page entry to the Contents
\summary{
Data integration is a broad area encompassing techniques to merge data between data sources. Although there are plenty of efficient and effective methods focusing on data integration over homogeneous data, where instances share the same schema and range of values, their applications over heterogeneous data are less clear.  This thesis considers data integration within the environment of the Semantic Web. More particularly, we propose a novel architecture for instance matching that takes into account the particularities of this heterogeneous and distributed setting. Instead of assuming that instances share the same schema, the proposed method operates even when there is no overlap between schemas, apart from a key label that matching instances must share. Moreover, we have considered the distributed nature of the Semantic Web to propose a new architecture for general data integration, which operates on-the-fly and in a pay-as-you-go fashion. We show that our view and the view of the traditional data integration school each only partially address the problem, but together complement each other. We have observed that this unified view gives a better insight into their relative importance and how data integration methods can benefit from their combination. The results achieved in this work are particularly interesting for the Semantic Web and Data Integration communities.
}
\clearpage  % Abstract ended, start a new page
% The Abstract Page
\addtotoc{Samenvatting}  % Add the "Abstract" page entry to the Contents
\samenvatting{
Data-integratie is een breed gebied dat technieken omvat voor het samenvoegen van data uit verschillende gegevensbronnen. Alhoewel er genoeg efficiënte en effectieve methodes zijn die zich richten op data-integratie voor homogene data, waar instanties hetzelfde schema en bereik van waardes delen, is hun toepassing op heterogene data minder voor de hand liggend. Deze thesis beschouwt data-integratie binnen de context van het Semantic-Web. In het bijzonder introduceren wij een nieuwe architectuur voor instantie-matching die rekening houdt met de bijzonderheden van deze heterogene en gedistribueerde setting. In plaats van aan te nemen dat instanties hetzelfde schema delen werkt de voorgestelde methode zelfs als er geen overlap is tussen de schema’s met uitzondering van een identificerend label dat matchende instanties delen. Bovendien hebben we de gedistribueerde aard van het Semantic-Web in beschouwing genomen om een architectuur voor te stellen voor algemene data-integratie dat on-the-fly werkt volgens het pay-as-you-go principe. We laten zien dat onze visie en die van de traditionele data-integratie school beide slechts een deel van het probleem afdekken, maar gezamenlijk elkaar complementeren.  We hebben waargenomen dat deze geünificeerde visie een beter inzicht geeft in hun relatieve belang en hoe data-integratie kan profiteren van hun combinatie. De resultaten die in dit werk zijn bereikt zijn bijzonder interessant voor de Semantic-Web en Data-Integratie gemeenschappen.
}
\clearpage  % Abstract ended, start a new page
%% ----------------------------------------------------------------

\setstretch{1.3}  % Reset the line-spacing to 1.3 for body text (if it has changed)

% The Acknowledgements page, for thanking everyone
\acknowledgements{
\addtocontents{toc}{\vspace{1em}}  % Add a gap in the Contents, for aesthetics

There are no proper words to convey my deep gratitude and respect for my research promoters and supervisors, Professor Marcel Reinders and Professor Arjen de Vries. Thank you for your support and help to make my thesis possible. 

A very special thanks goes out to Dr. Jan Hidders for his support and collaboration during all my Ph.D, our great discussions  always helped me to improve my work. I would like to express my gratitude to my co-author Dr. Duc-Thanh Tran for sharing his broad knowledge in the field and for his extremely valuable coaching in writing research articles. I am deeply indebted to Professor Daniel Schwabe who has supported me since the beginning of my research carrier. His co-supervision was fundamental on this thesis. 

Appreciation also goes out to other members of EWI for the fellowship and collaboration: Bebei Hu, Erwin Leonard, Qi Gao, and Fabian Abel. Thanks also to all members of DMIR lab for the excellent working environment. Thanks Robbert for his technical assistance; Saskia and Esther for their secretarial support. Also I am thankful for the CWI group for their fellowship. I would like to express my appreciation for the essential support of Ilse Oonk and Sophie Ronde during my Ph.D.

I deeply thank my parents and family for their unconditional support. Finally, I want to acknowledge the support of my beloved Ekaterina Churakova, without her love and patience, this thesis could not have been finished.

}
\clearpage  % End of the Acknowledgements
%% ----------------------------------------------------------------

\pagestyle{fancy}  %The page style headers have been "empty" all this time, now use the "fancy" headers as defined before to bring them back


%% ----------------------------------------------------------------
\lhead{\emph{Contents}}  % Set the left side page header to "Contents"
\tableofcontents  % Write out the Table of Contents

%% ----------------------------------------------------------------
\lhead{\emph{List of Figures}}  % Set the left side page header to "List if Figures"
\listoffigures  % Write out the List of Figures

%% ----------------------------------------------------------------
\lhead{\emph{List of Tables}}  % Set the left side page header to "List of Tables"
\listoftables  % Write out the List of Tables

%% ----------------------------------------------------------------
\setstretch{1.5}  % Set the line spacing to 1.5, this makes the following tables easier to read
\clearpage  % Start a new page
   
%% ----------------------------------------------------------------
% End of the pre-able, contents and lists of things
% Begin the Dedication page

\setstretch{1.3}  % Return the line spacing back to 1.3

\pagestyle{empty}  % Page style needs to be empty for this page
\dedicatory{To my family.}

\addtocontents{toc}{\vspace{2em}}  % Add a gap in the Contents, for aesthetics

